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Abstract 



A technique is described for removing interference from a signal of interest ( "chan- 
nel 1") which is one of a set of ./V time-domain instrumental signals ("channels 1 to 



iV"). We assume that channel 1 is a linear combination of "true" signal plus noise, 
and that the "true" signal is not correlated with the noise. We also assume that 
part of this noise is produced, in a poorly-understood way, by the environment, and 
that the environment is monitored by channels 2 to N. Finally, we assume that the 
contribution of channel n to channel 1 is described by an (unknown!) linear transfer 
function R n (t — t'). Our technique estimates the Ri and provides a way to subtract the 
. environmental contamination from channel 1, giving an estimate of the "true" signal 

which minimizes its variance. It also provides some insights into how the environment 
is contaminating the signal of interest. The method is illustrated with data from a 
prototype interferometric gravitational-wave detector, in which the channel of interest 
q-( (differential displacement) is heavily contaminated by environmental noise (magnetic 

and seismic noise) and laser frequency noise but where the coupling between these 
signals is not known in advance. 

> : 

^ '. 1 Introduction 

a, 

There are many situations of interest in which data are contaminated by the environment. 
Often this contamination is understood, and by monitoring the environment it is possible 
to "clean up" or "reduce" the data, by subtracting the effects of the environment from the 
signal or signals of interest. Examples include measurements of the earth's magnetic field 
contaminated by harmonics of 60 Hz, or a telephone conversation carried on a transmission 
line, which has been corrupted by electro-magnetic cross-talk from nearby lines. The work 
in this paper was motivated by another example: the data stream from an interferometric 
gravitational radiation detector [0. In this instance, the signal of interest is the differential 
displacement of suspended test masses. A small part of this displacement arises from gravita- 
tional waves, but there are also large contributions arising from contaminating sources, such 
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as the shaking of the optical tables (seismic noise) and forces due to ambient environmental 
magnetic fields. Particularly at low frequencies, these types of ambient environmental noise 
are the fundamental effects limiting the sensitivity of the instrument |2| . The key point here 
is that the gravitational waves are not correlated with any of these environmental artifacts. 

In many such situations, it is possible to monitor the environment, offering the hope of 
removing from the signal of interest the contaminating effects of the environment. For the 
prototype gravitational wave detector used as an example in this paper B, about a dozen 
of these environmental signals were monitored, including components of the magnetic field, 
acoustic pressure, acceleration of the optical suspension, and so on ||. It is not hard to 
see that in many cases, these environmental fields add directly into the signal of interest, 
after convolution with some (unknown) response function. For example the suspension of 
the optical elements of the interferometer may be physically modeled by a coupled set of 
masses, springs, and frictional elements (dashpots), and thus acts as a mechanical filtering 
device. The displacement of the ground is filtered through this suspension and the resulting 
displacement is added into the one arising from any gravitational waves. Thus if the ground 
displacement were monitored, and if we knew the exact transfer function of the suspension, 
we could remove from the differential displacement signal the part due to ground motion. 

The difficulty here is that these transfer functions are not known, and can not be accu- 
rately calculated from first principles. For example the mechanical filters which isolate the 
suspension from the ground contain non-ideal springs, damping elements whose restoring 
forces are not proportional to velocity, and so on. It might in principle be possible to mea- 
sure these transfer functions (for example by shaking the ground in a controlled way) but 
in many cases this is not practical. 

2 Notation 

Although our methods could be generalized to the case of continuous-in-time signals, we 
will assume from here on that all the signals are discretely sampled in time. We will assume 
that the raw data (channels 1 to N) are time series, sampled at regular intervals At. We do 
not assume that these sample rates are the same for all the channels, so in particular (At) n 
will denote the sample rate of the n'th channel. The M n (assumed even) different sample 
values of channel n at regular time intervals will be denoted by 

Yn(j) = vame °f channel n at time t = j(At) n (1) 
for j = 0,---,M n -l. 

We assume that each of the channels has been sampled over the entire time interval t G [0, T] 
and thus that T = M n (At) n has the same invariant value for all channels n = 1, ■ • ■ , N. 
Because the primary goal of our technique is to extract an approximation of the "true" or 
"uncontaminated" values of channel 1, we adopt a special notation for this channel, and use 

m = Y 1 u) (2) 

to denote the signal of interest. 

Our methods assume that the contamination of channel 1 by the other channels is de- 
scribed by linear filters or transfer functions. The action of a linear filter (convolution in the 
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Figure 1: The case where the instrument's output consists of only two channels (X and 
Y) is quite simple. The "true" value which the channel of interest is designed to measure 
is denoted by x. The actual instrument output for the channel of interest is denoted by X. 
It is a linear combination of x and the environmental variable y 2 = Y 2 , convolved with the 
response function R 2 . By assuming that x and y 2 are not correlated, we can estimate the 
value of R 2 and thus estimate the value of x. 



time domain) is most simply represented in the frequency domain (where it is just multipli- 
cation), and thus much of our work will take place in the frequency domain. The Discrete 
Fourier Transforms (DFT) of the channels will be denoted by 

Mn-l ( At, \ 

Y n (k) = £ exp (arri^-J YJJ) (3) 
for k = -M n /2, • • • , M n /2. 

The index k labels frequency bins, and in particular the fc'th bin of channel n corresponds 
to a frequency 

/(«,*) = k/T. (4) 

Note that throughout this paper, the word band is used to denote a collection of adjacent 
frequency bins. We assume that the raw signals (channels) are real values, i.e. that the 
Y n (k) are real, which implies that Y n (k) = Y*(—k) where "*" denotes complex conjugate. 



3 Model (two-channel case) 

We begin by examining the case of only two channels. This is a good way to introduce 
the main ideas of the analysis and the principal techniques. In Section [7] we generalize this 
method to the iV-channel case. 

Our model may be though of in terms of the diagram in Figure |I[ The output of 
the instrument, in other words, the actual sample values produced by the experiment, are 
denoted by X and Y 2 , in the notation introduced earlier. These values are to be thought of as 
"imperfect" representations of some true values which are the variables that the experiment 
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or instrument is attempting to measure. However the actual instrumental outputs are not 
exactly equal to these values, because they have been contaminated by the environment. We 
denote the true value which the instrument is attempting to measure by x, and the actual 
output of the instrument by X. In the two-channel case, the environmental monitor channel 
is denoted by Y 2 ] without loss of generality we may assume that it is equal to the actual 
value of some environmental variable y 2 - 

In our model, we assume that the true value x of the desired signal is not present at the 
output, because it is contaminated by the environment. We assume that this contamination 
may be represented by a linear filter applied to the environmental variable y 2 - For example, 
suppose that x is the temperature of a sample of material (if that material is surrounded 
by a constant temperature environment) but in fact the temperature of the environment Y 2 
is not constant, and varies with time. The influence of the environment on the measured 
temperature X of the sample is complicated by the fact that the heat from the environment 
must diffuse through a thermal insulator before reaching the sample, so that a change in the 
temperature of the environment is not immediately reflected in a change in the temperature 
of the sample. In this example, the effect of the environment on the sample temperature 
may be modeled by a first-order linear filter, whose impulse response decays exponentially 
in a thermal diffusion time. 

In the example that served to motivate this paper, the desired signal is the difference x 
in optical phase between two paths of a suspended interferometer produced by gravitational 
waves. However the instrument contains steering magnets, which are sensitive to the ambient 
magnetic fields in the laboratory: these magnetic fields result in forces on the optical elements 
which also change the optical phase. Assuming that the geometry of the laboratory and of the 
instrument (which serves to convert magnetic fields into magnetic gradients) is not changing 
with time, one would expect to find a linear filter relationship between some component 
of the laboratory magnetic field and the output X of the relative optical phase channel. 
Similar effect arise from seismic motion and from other sources. 

4 Method (two channel case) 

The basic idea of our method is to estimate the transfer functions Ri. This is most easily 
illustrated in the two-channel case. The situation of interest is one in which the transfer 
function does not change with time, or changes slowly with time. This is the case if it 
is defined by spring constants (i.e. mechanical coupling) or mutual inductances (electrical 
cross-talk) or other quantities that depend upon geometrical and mechanical properties 
which change slowly (adiabatically) with time. 

To estimate the transfer function requires an averaging process. It might seem natural to 
average in time, but the calculations are easier to understand and express if the averaging is 
carried out in frequency space instead. For this reason, we imagine that the frequency space 
occupied by our signal (which for the n'th channel is R Mn ) is broken up into subspaces 
that span frequency bands. To introduce the notation, we first consider the channel of 
interest, X. For future convenience we will assume that the Nyquist frequency bin X(Mi/2) 
does not contain any useful information (i.e. that an anti-aliasing filter was used in taking 
the data) and that we can project our signal onto the R M ™ _1 dimensional subspace that 
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does not include this frequency bin. For notational purposes, write this frequency-space 
representation as the vector 



X 



X(0),X(l),---,X(M 1 /2-l)\ (5) 

X (°) ..^x^- 1 )] 



where we have decomposed the R 1 ^ 1 into a set of Si orthogonal vector spaces, each of which 
contains only the frequencies in a particular band b — 0, • • • , B\— 1. The number of individual 
frequency "bins" contained in one of the frequency bands is (the dimensionless integer) F, 
and B n F = M n /2 for n = 1, ■ • • , N. The number of frequency bands M n /2F does depend 
upon the channel number (or sample rate) but the number of bins F in a given band does 
not. Thus, the vector X^ = X(0),X(1), • • • ,X(F — 1) . In general, the vector associated 

with frequency band b consists of X® = [x(bF),X(bF + 1), ■ ■ ■ , X((b + 1)F - 1)1 . The 
frequency band labeled by the dimensionless index b spans a range of physical frequency / 
(in cycles/unit-time) given by the half-open interval 

bF 

/e[/ 6 ,/ 6+1 )with/ 6 = — . (6) 

Later, we will discuss how we choose the number of frequency bands. This is related to the 
question of how much averaging is needed to accurately estimate the transfer functions. 

This notation generalizes in the obvious way to the other channels Y 2 , • • • , Y/v. Note that 
the number of real degrees of freedom of the n'th channel is M n . The complex coefficients 
Y n {i) for i = 1, • • • , M n /2 — 1 contain M n — 2 of those real degrees of freedom. The coefficients 
Y n (0) and Y n (M n /2) are both real and contain the remaining two real degrees of freedom. As 
before (with no significant loss of generality) we will assume that Y n (M n /2) is zero, because 
an anti-aliasing filter has eliminated any signal contributions near the Nyquist frequency. 

To express the correlation between two channels (or the auto-correlation of a channel 
with itself) it is useful to introduce a bi-linear inner product. This is defined by 

(y£\y£>)= £ Y ni (k)f: 2 (k). (7) 

bF<k<(b+l)F 

This is just the ordinary Cartesian inner product between the two vectors, after they 
have been projected into the subspace spanned by the 6'th frequency band. The quan- 
tity ^X®, X^J is the power spectrum of channel X, summed over the fe'th frequency band: 
the total power in the 6'th frequency band. Notice that the inner product is only defined 
if both channels rti and n 2 are sampled quickly enough so that both of them extend up to 
the 6'th frequency band. If the frequency band lies above the Nyquist frequency of either 
channel, the inner product is not defined. Note also that we could define another inner 
product, which is the ordinary Cartesian one (with no projection) by summing (Y®, Y^J 
over the range b = 0, • • • , B min = min(£? ni , B n2 ), but this is used so little that it's not worth 
the trouble. 

We are now prepared to estimate the transfer function R 2 {f) shown in Figure |l| Our 
goal in doing this is to estimate the "true" channel of interest x. We denote the estimate of 
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this quantity with an overbar: x. We also use the overbar to denote our estimates of derived 
quantities, for example x. 

We assume that R 2 (f) is complex constant within each frequency band b, in other words 
that the transfer function does not vary rapidly over the frequency bandwidth Af = F/T. 
For notational convenience, let us denote the constant value of R2U) m a given frequency 
band by r^ b \ Given the transfer function within the frequency band, our estimate of the 
Fourier transform of the "true" channel of interest is 

~W = X(")- r WY 2 (fe) . (8) 

We assume that the best estimate of the transfer function in the frequency band 
b is the one that minimizes the norm N = fx^,x^). Notice that although the vector 



x^ contains F components, our estimated transfer function is a single complex number, 
containing in practice many fewer degrees of freedom than x^. In this way, the value of 
the transfer function averages over the different frequency bins within the band b, and thus 
corresponds to a time average. 

To find n ' we minimize the norm Under an arbitrary variation 5r^ > 

one has 

SN = -(sr^Y 2 (b) ,X^-r^Y 2 (b) 



±( b )-r^Y 2 (b \5r^Y 2 (b) 
= -Sr^(Y 2 ib \x^-r^Y 2 {b ^-CC 
tfr<« (Y 2 W X( 6 )-r( fo )Y 2 (6) ) 



= -23? 



(9) 



where "CC" denotes the complex conjugate of the previous term. In order that SN vanish 
for all choices of the complex number Sr^ the inner product appearing on the final line 
must vanish: 

(Y 2 (6) ,xW-rWY 2 (6) ) =0. (10) 

The unique solution to this equation gives our best estimate of the transfer function R 2 (f) 
in the frequency band b as: 

(^ b \Y 2 (b) ) 

(Y 2 «",Y 2 m ) (U) 

We note that instead of minimizing the inner product of our estimate of the true channel 
of interest independently within each given frequency band b, we could also have minimized 
the inner product defined as a sum over all B min frequency bins; this gives the same result 
since vectors obtained by projection onto orthogonal subspaces (corresponding to different 
frequency bands) have zero inner product. 

How effective is this procedure likely to be? Clearly, this depends upon how much con- 
tamination there is, in the channel of interest, and upon how well the different environmental 
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Figure 2: The estimated correlations between the IFO_DMRO channel (X) and the other 
11 environmental channels. Each individual graph has vertical scale to 1. 



signals monitor the different sources of contamination. In order to quantify these effects, it 
is useful to introduce the covariance coefficient between channels i and j in frequency band 
b, which is defined by 



0) 
Pij 



^(Yf',Yf)(Yf,Yf)' 



(12) 



From the definition it follows that < pfj < 1. This quantity may be interpreted as the 
(absolute value of) the cosine of the angle between the vectors representing the i'th and 
j'th channels. When p$ is close to unity this means that the z'th and j'th channels are 
very correlated or anticorrelated; when close to zero this means that there is no statistically 
significant (anti) correlation. The question "how large a p[f is statistically significant" will 

be addressed in Section |9|. The covariance coefficients pfj between the IFO channel (X = Yi) 
and the other 11 environmental channels Y 2 , - ■ ■ , Y" 12 are shown in Figure 0. 

There are a number of interesting features in the graph of p\] that are worth brief 
comment. 

• The magnetometer output shows beautifully strong correlation with the IFCLDMRO 
at all multiples of the line frequency of 60 Hz. The large ambient magnetic fields in 
the laboratory are probably being produced by motors in the ventilation system and 
transformers in the argon laser power supply. 

• The correlations between the microphone and the IFCLDMRO may reflect mechanical 
resonances in the mechanical suspension and isolation systems which are driven by 
ambient acoustic noise. 
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• The DC strain is a low-pass filtered version of the IFO_DMRO channel of interest: 
channel X\ So in fact it has excellent low-frequency correlation with the IFO_DMRO 
channel because these are measuring essentially the same thing. Note that this channel 
will be left out of the decontamination procedure that we describe, since that procedure 
is intended only for signals that should not show any intrinsic correlation with the true 
quantity of interest. 

• The mode cleaner correlation is easy to understand. It occurs because the mode cleaner 
removes most but not all of the laser frequency noise. The remaining frequency noise 
is converted by the interferometer into an effective change in the arm length. 

• The seismometer shows interesting and significant low-frequency correlation with the 
IFO_DMRO. The mechanical suspension does not entirely isolate the instrument from 
ground motions, and these are subsequently converted into motions of the suspended 
masses. These low-frequency correlations are precisely the sort of correlations that 
will be removed by the procedure described in this paper. 

• The arm 2 visibility and the slow pzt show almost identical correlations with the 
IFO_DMRO channel. We do not understand why. 

• The arm 1 coil driver shows very clear low-frequency correlations with the IFO_DMRO. 
These may be related to the previously-described correlation between the mode cleaner 
and the IFCLDMRO. 

A technique for simultaneous removal of all of these correlations from the IFO_DMRO is 
described in Section but for the moment we return to the simplest, two-channel case. 

In the two-channel case, the transfer function in frequency band b was estimated by 
minimizing the norm N = (x (6) ,x (6) ). This led to a unique solution for the estimated 
transfer function r^ b \ given by (11). How much is the norm A" reduced when compared with 
the corresponding norm of the original channel of interest fX®, X^ 6 M before any correlations 

were removed? This may be found by substituting the value of r^ h > (|ll|) into the definition 
of N. One obtains 



N = 



= <5) ,(6) 



r (6) f X W,Y 2 (fe) V + |r^| 2 fY 2 (6) ,Y 2 (6) 



(6) 



X M Y 2 (6) 



Y 2 (6) ,Y 2 (&) 
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(XWY 2 W ) 


2 


(x( fe ),xw) 


(Y a W ,Y a W ) 



« {b) V 

P12 J 



(13) 



The fractional reduction in the norm N is 1 — (pi? ) • Thus, if an environmental channel 
is strongly correlated with the channel of interest, a significant reduction in the norm is 
obtained. As discussed following equation (|7]) this is may be though of as a reduction in the 
total power in the 6'th frequency band. 



5 An Example (two channel case) 

Our example (including Figure |2|) is based on data from the Caltech 40-meter prototype 
gravitational wave interferometer ||. During one week in November 1994, this instrument 
was used to collect data for later analysis. Between eleven and fourteen channels of data 
were collected. The channel of interest X is the InterFerOmeter Differential Mode Read-out 
(IFO_DMRO) and the other sampled channels consist of environmental and instrumental 
monitors. The channels were sampled at either 9868.42 ■ • • Hz (fast channels) or at one-tenth 
that rate (slow channels). 

In our first example, we consider only two channels: X — Y\ is the IFOJ3MRO and 
Y 2 is IFO_Mag_x. This is the x— component of the magnetic field sampled near one of the 
optical elements denoted. Both of these signals are sampled at the fast rate. We used 
Mi = M 2 = 10 x 2048 x 128 samples from the 18 November 1994 run 1 data set, spanning 
approximately 266 seconds. To carry out the averaging we choose F = 128 frequency bins in 
each of B\ = B 2 = 10 x 2048 frequency bands. This is the same data set whose correlations 
with the IFO_DMRO are illustrated in Figure |[ 

There is particular reason to believe that the IFCLDMRO is strongly contaminated by 
ambient magnetic field noise (or by signals which are correlated to that). This is because 
the optical elements of the interferometer suspension are steered and controlled by magnetic 
forces. Many of the optical elements have magnets fastened to them, and small coils are 
used to provide some of the servo feedback used to maintain the optical resonance of the 
interferometer. The laboratory magnetic fields arise from a number of sources, including 
motors which are part of the air-circulation systems in the laboratory, and power-mains and 
power-supply wiring such as the three-phase current driving the argon-laser power supply. 
It is also possible that ripple from the power supplies is present in the servo loops whose 
error outputs are the source of the IFCLDMRO signal. 

Figure |3] shows the two channels X and Y 2 for 266 seconds. Because our primary goal 
is to remove low-frequency (/ < 987/2 Hz) contamination from X, these channels have 
been low-pass filtered by (1) transforming into the frequency domain (2) setting to zero all 
spectral amplitudes at frequencies > 0.1 /Nyquist then (3) transforming back into the time- 
domain. Although it is not obvious from the graphs, both channels contain strong sinusoidal 
components at multiples of the line frequency 60 Hz. Notice that the rms value of channel 
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Figure 3: Approximately 266 seconds of two channels of the Caltech 40-meter interferometer 
output, after low-pass filtering to remove all frequency components above 0.1 /N yqu ist- X 
denotes the Differential Mode Readout, which is the channel of interest. Y 2 is the output 
of a magnetometer, sensing a component of the local magnetic field. These two signals are 
both contaminated by many harmonics of 60 Hz. They are shifted by ±80 ADC counts for 
clarity. 



X is about 30 ADC counts. Also notice that the small instrumental feature (blip) around 
t = 46 sec is almost obscured by the surrounding "hash" . The Fourier transforms of these 
two channels are shown in Figure |j. Using the procedure that we have described, we can 
estimate the coupling R^{f) between these two channels. This is shown in Figure [5]. In 
each frequency band, the estimate of R 2 is a sum over the F = 128 different frequency bins 
contained in that band. If there is no correlation between the two channels, the expected 
value of this sum behaves like a random walk, accumulating proportional to y/F. (The case 
where there is no correlation is considered in detail in Section ||) In frequency bands where 
the two channels are correlated, the expected value of the sum accumulates proportional to 
F. 

The final result, Figure |5] shows the estimated "true" value of the IFO Differential Mode 
Output channel, after subtracting the estimated crosstalk. 

6 Method (TV-channel case) 

In Section || we showed how it is possible to obtain an estimate of the coupling between 
two channels, by searching for the linear combination (in frequency space) that minimizes 
the variance of the channel of interest. In this section, we generalize this method to the 
iV-channel case, where in addition to the channel of interest, there are N — 1 additional 
environmental monitoring channels. 
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Figure 4: The amplitude spectrum of the data sets from Figure |3|. Notice that there are 
strong line-like features at the harmonics of 60 Hz, particularly around 180 and 300 Hz in the 
channel of interest. The former may be due to the laser's power supply producing cross-talk 
in other electronics. This graph shows only frequencies < 0.1 /Nyquist- 
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Figure 5: The estimated coupling function Ri{f) between the IFO channel (X) and the 
magnetometer (Y 2 ). This estimate is dominated by noise, except at frequencies where its 
modulus is large compared to nearby frequencies. At these frequencies, the estimate is 
accurate. The frequencies at which R 2 can be accurately estimated includes (but is not 
limited to) many of the 60 Hz line harmonics. 
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Figure 6: The final result of the estimation process that is described is an estimate of 
the true value of the IFO_DMRO channel after subtraction of the correlated contamination. 
This should be compared with the original time/frequency domain data shown in Figures |3| 
and |j. Notice that the rms of the time-domain signal has been reduced by about a factor of 
six! There has been a significant reduction in the spectral content of the signal around 180 
and 300 Hz, and the instrumental effect around 46 seconds is much more apparent. 



The basic idea is identical. We estimate the "true" value of the channel of interest as: 



-(&) 
x 



N 



(6)V W 



J=2 



(14) 



Here the are a set of N — 1 coupling constants: they are our estimates of the contribution 
that the channel Yj makes to the channel of interest X in the frequency band b. As before, 
we choose these coupling constants in the way that minimizes the total power in the channel 
of interest, assuming that they are constant throughout the 6'th frequency band. This means 
that we choose the r^p in order to minimize the expected value of the norm iV 

under an arbitrary variation 5rf\ 



,( 6 ) = W 



5N 



N 

E 

J=2 



- I 8rf ] Yj (6) *W 



fe=2 / 
N / / N \ 

E Yj (6) ,XW - £ rf } Y k (fe) - CC 

j=2 V \ fc=2 / 

7=2 \ fc=2 / 



(15) 
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In order for this quantity to vanish under all variations of the N — 1 coupling constants 5rf^ 
one must satisfy N — 1 equations (for j = 2, • ■ ■ , N) : 

(Y^P-Erf^^O. (16) 

V k=2 ) 

This may be conveniently written in matrix/vector form. To do so, define the correlation 
matrix estimate in the b 'th channel: 

Gj? = (Y/W) . (17) 



This matrix Cj^ is Hermitian and positive semi-definite. Notice that the entries of this 
correlation matrix are defined for j,k — 1, • • • , N since by definition the channel of interest 
X = Y\. This means that "intrinsically" C is a square N x N matrix. 
The equations satisfied by the coefficients (|l6f) may now be written as 

N 

Cj? = £cS\JW (18) 

k=2 

Notice that the left hand side is determined by the correlations between the channel of 
interest and the environmental channels. The matrix that appears on the right hand side 
is determined by the correlation between the different environmental channels. In the case 
where these are not correlated (i.e., a given environmental channel is only correlated with 
itself) then the matrix on the right hand side is diagonal, and the situation is very similar 
to the two-channel case. 

If all of the channels are non-zero in at least one bin in frequency band b then the matrix 
is Hermitian and positive definite, so that it may be inverted. We denote the inverse of this 
matrix by the symbol C _1 . Note: this is not the inverse of an iV x iV matrix. It is the 
inverse of the (N — 1) x (N — 1) matrix defined by (|T7|) for j, k = 2, ■ ■ • , N. 

The coupling constants that minimize the variance in the channel of interest are now 
given by: 

rf = Y,(C~ 1 ) C kl for j = 2, (19) 

k=2 3 

Although it is tempting to interpret this equation as "inverse of a matrix times the matrix" 
and replace the rhs by 5jl this is not correct, because C~ x is the inverse of an (AT— 1) x (AT— 1) 
matrix. 

It is again possible to ask how much the norm A" is reduced when compared with the 
corresponding norm of the original channel of interest fX^,X^J before any correlations 

were removed. This may be found by substituting the value of (|TTD into the definition 
of N. One obtains 

N EE fx (6) ,X W 



x«-f; r j 6) Y J (k \xw-f;rfY J (6) 

3=2 j=2 
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= (xW,XW)[l-|pW| 2 ]. (20) 
where the second line follows from Eq. [16| and we have defined 



N 



\ P (b) \ 2 = £»f 

i=2 

TV /V 



y/ 6) ,xw 



x( fe ),x(^ 

EE^M/Mrf. (21) 

i=2 fc=2 

The second form expresses | 2 in manifestly positive definite form, while from its definition 
it is always less than or equal to 1. The quantity |p^| 2 provides a useful measure of the 
total improvement in the signal. To understand which environmental channels led to this 
improvement one may study the N — 1 pairwise covariance coefficients \pfj\ 2 - 

7 Example (TV-channel case) 

Our iV-channel example is based on the same 18 November 1994 run 1 data from the Caltech 
40-meter prototype gravitational wave interferometer || that was used in the previous 2- 
channel example in Section ||. As before, the channel of interest X is the InterFerOmeter 
Differential Mode Read-out (IFO_DMRO) and the other 11 sampled channels consist of 
environmental and instrumental monitors. Three of the channels (including IFO_DMRO) 
were sampled at the fast rate of 9868.42 • ■ ■ Hz and the other nine were sampled at exactly 
one-tenth that rate. The different channels are shown in Table |l| The covariance coefficients 
pij between these channels and the IFO_DMRO channel were previously shown in Figure |2|. 

As before, we used the first M± = M 2 = 10 x 2048 x 128 samples from the data set, 
covering about 266 seconds. As before, to carry out the averaging we choose F = 128 
frequency bins in each of B% = B 2 = 10 X 2048 frequency bands. Because the DC strain 
channel is effectively just a low-pass filtered version of the IFO_DMRO channel, it was left out 
of the removal process. The result of this procedure is shown in Figure [?[ It is very useful 
to compare this with the previous 2-channel case, where we removed only contamination 
that was correlated with the magnetometer channel. In comparison to this previous case, 
the following features are evident: 

• The "end effects" that are apparent in the two channel case (Figure |B|) are no longer 
present: these were not end effects but contamination of the IFO_DMRO channel by 
an interfering signal. 

• Comparison of the power spectra (Figures ^J§,0) in the region below 60 Hz shows that 
a significant reduction in low-frequency content has been obtained. 

• The time-domain properties of the estimated detector noise are now more uniform, 
rather than less uniform. This is good evidence that the signal content which is being 
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Table V. Channel assignments for the November 1994 data runs. Channels 0-3 are the "fast" 
channels, sampled at about 10 kHz; the remaining twelve are the "slow" channels, sampled 
at about lKHz. Note that the power stabilizer channel was accidentally disconnected until 
approximately 20:00 local time and so was not used by us, and that some channel numbers 
were not present in the data. 
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Figure 7: The final result of the estimation process, to remove contamination from 11 
monitored environmental channels from the IFO_DMRO channel. This should be compared 
with the original time/frequency domain data shown in Figures |] and |j. 
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removed is in fact a true correlated signal and not merely an artifact of the subtraction 
procedure described here. 

8 Reducing the effects of correlated sidelobes 

When we began this work, our original intent had been to carry out a procedure similar to 
the one just described. However that procedure failed, for reasons that are interesting, and 
are worth explaining here. 

The procedure which failed can be summarized as follows: 

• Take long stretches of data from each of the N channels, spanning a time interval of 
length T. 

• Cut them into T/r short segments of length r (say, one second long) . 

• Transform these into the frequency domain. 

• For each short segment, and in each frequency bin, calculate an N x N matrix con- 
taining the products of the Fourier amplitudes of the different channels. 

• In each frequency bin, average the T/r matrices thus obtained to get an estimate of 
the correlation matrix. 

• Use this correlation matrix to estimate the transfer function Rj that minimizes the 
total power in each frequency band. 

The reason why this procedure failed is not hard to understand. 

One might expect that in this procedure, since the length of each segment in the time- 
domain is r, the frequency-resolution of this method is Af ~ r _1 . Thus, for example, the 
line-frequency harmonics appearing at multiples of 60 Hz might be expected to be resolved 
within a band about ±1 Hz about their true locations. This is correct. 

The problem occurs because in many instances, these line-like features in the frequency 
domain have much larger amplitude (by orders of magnitude) than the neighboring frequency 
bins. In addition, these line-like features do not lie precisely at the center of a frequency 
bin (in the time domain, the corresponding sinusoids do not undergo an integer number 
of oscillations during the time-interval r). Consequently, these line-like features exhibit 
sidelobes of the windowing function. In the method that we have described, this windowing 
function is rectangular (on or off) but even if a more sophisticated and smoothly-varying 
window function is chosen, the sidelobes are still present. These sidelobes are much smaller 
than the central maximum, and depending upon the choice of window function, they fall off 
as some (inverse) power of the separation in frequency bins from the central line. Since the 
energy in the central line is so large compared with neighboring frequencies, these sidelobes, 
while insignificant compared with the central line feature, are still large enough to completely 
dominate the signals at neighboring frequencies. Consequently, one finds that there are 
large correlations arising from the central line-like features, extending out over a range of 
frequencies that is quite large compared to Af ~ r _1 . In many of the instances which we 
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Figure 8: A comparison of two methods of estimating the transfer function i? 12 . The 
dark curve shows the method used in this paper: in each frequency band b the estimate is 
constructed as an average over nearby frequency bins. The light curve shows the method 
that failed: it is essentially the time average of a single frequency bin in a sequence of 
short Fourier transforms. It fails because sidelobes of strong line features cause spurious 
correlations over a much wider range of frequencies than Af = 1/T. 

examined, these correlated sidelobes dominate the true correlation out to 50 Af. This is 
shown in Figure |8|. 

The failure of this other method may be easily summarized as follows. Although the 
energy arising from a sinusoidal signal present in several channels is largely confined to a 
(small) bandwidth Af, the correlation arising from this signal can dominate the correlation 
over a bandwidth which is fifty times larger! The resulting loss in frequency resolution is 
unacceptable. For this reason, we don't use (or recommend!) this method for estimating 
the correlations between different channels. 

9 Avoiding False Dismissal of "Correlations" (two-channel 

case) 

The methods that we have described for removing environmental contamination or crosstalk 
from signals of interest assumes that there is no correlation between the environmental 
monitors and the signal of interest, and thus that any correlation which is found is due to 
"leakage" or "crosstalk" in the instrument. If this assumption is satisfied, one might well 
ask, "Is there a danger of falsely removing correlations which do not in fact exist in the 
observed signals?" To quantify this requires that we make assumptions about the statistics 
of any uncorrelated signals. 

Suppose that we consider the case where the iV channels are independent uncorrelated 
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Gaussian random variables, with a white power spectrum. For simplicity let us also assume 
that each has zero mean value and unit variance. This is a situation where a good technique 
for removal of correlated noise from the channel of interest should do absolutely nothing, 
since there is no correlation to remove! How does the technique described here perform in 
this situation? 

For simplicity, consider first the two-channel case of Section f|. Suppose that the signal 
values X(i) and I2C7) are independent Gaussian random variables with mean value zero 
and unit variance. In this case, the Fourier amplitudes X(i) and Y 2 (j) are also independent 
Gaussian random variables. The estimated transfer function ( |TTD for a particular frequency 
band has mean value zero. The expectation value of its square is given by 
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The calculation of the last quantity is slightly complicated and may be found in Appendix 
Here, we approximate it in the case where the frequency band (b) contains many frequency 
bins. Since the number of frequency bins in the 6'th frequency band is denoted by F, we 
will assume that F » 1. In this case one obtains 
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Because the estimated transfer function r and the covariance p 2 are related by 
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our simple example of two uncorrelated channels would have the expectation value of p\ 2 in 
each frequency band equal to l/F. Thus, on the average, blind application of our method 
would reduce the variance of the channel of interest by the fraction 
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(25) 



This is clearly unacceptable since there is no correlation actually present, and the power 
in the channel of interest should not be reduced at all. In the case where we have two 
uncorrelated Gaussian random channels, with for example F = 128, the direct application 
of the method described here will reduce the power in the channel of interest by almost one 
percent! 
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The problem that we are describing is that of incorrectly or falsely removing correlations 
that are not really present! If the length of the data set were extremely long, so that the 
number F of frequency bins in any given frequency band were very large, then this problem 
would disappear. However in practical work, it is unreasonable to have very large numbers 
of frequency bins F in each band. 

One simple solution to this problem is to threshold on the covariance. In other words, 
we examine each environmental channel in turn, and ask if it is correlated with the channel 
of interest. If such a covariance is present at a statistically- significant level, the correlation 
is removed. Otherwise, the correlation is not removed. Since the expectation value of p 2 is 
1/F, we can set a threshold of say 10/F 



10 Avoiding False Dismissal of "Correlations" (n-channel 
case) 

In the N-channel case, there is also a risk of falsely removing "correlations" that are not 
present. In Section^, we introduced the correlation matrix by equation (|17]). All the following 

calculations are based on that matrix. Each entry of that matrix, = \ ~Y\} is 

the correlation between channels i and j in frequency band b. According to Appendix [A|, 
because the number F of frequency bins in frequency band b is finite, the correlation between 
any two channels can not be calculated precisely. Consequently there is a risk of finding 
correlations when none exist, and then incorrectly removing them. 

One method to avoid false dismissal of "correlation" is to threshold on every entry of 
the correlation matrix, Cj b k . We calculate the absolute value of the covariance coefficient 

between channels j and k in frequency band 6, pf k \ which is defined by equation (|12"D. If pf k 



is smaller than some threshold value p* (for example p* — then we set the corresponding 



F I ' 

entry C K " k in the correlation matrix to zero. If p\"' is greater than the threshold value p*, 



then we leave the corresponding entry C-^ in correlation matrix unchanged. We use, Dj k \ 
to denote the correlation matrix after thresholding: 
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with the correlation matrix after thresholding, Df^ 



The next step is to calculate the coupling constants using equation flT9]), but replacing Cj fe 
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J2{D {b %DS for j = 2, • • • , N. (27) 



k=2 



Having found the coupling constants rf , one can remove the correlations from the channel 



of interest using equation (|T4]). 

There is a problem when equation fl2"7p is applied to real data. Because the thresholding 
sets entries of the correlation matrix to zero, becomes nearly singular and its inverse 
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D^ 1 in equation (|27|) becomes unstable. For example, in Figure 0, there are two channels, 
slow pzt and arm2 visibility, which are very similar to each other. When the small correlation 
elements are set to zero in the correlation matrix, the two rows corresponding to these two 
channels become very close to each other, which makes nearly singular. 

To solve this problem, we eliminate "redundant" channels. Consider the eigenvalues A 
and the eigenvectors A of the matrix . Note that the matrix is Hermitian and 

positive semi-definite. Its eigenvalues are always real and non-negative. If the matrix 

becomes close to a singular matrix, will have an eigenvalue Ao which is very close to 
zero. We call the corresponding eigenvector A . Hence, 

N 

£>< 6 >A° = A A , or £ D$A° = A°A°. (28) 

k=2 

When A is very close to zero, the rhs of equation ( |28| ) vanishes. This means there is at 
least one row in D that can be written as a linear combination of the other rows. Because 
D is the correlation matrix of different channels, this implies that at least one channel 
is a linear combination of the other channels. That channel is a redundant channel and 
gives us no useful additional information about the environment. We can eliminate that 
channel from our channel set in order to keep the correlation matrix far from singular. To 
determine the "best" channel to eliminate, we consider the absolute value of elements A° 
in the eigenvector A . If |A°| is the maximum of all the absolute values of elements in the 
eigenvector A , this means that channel k makes the maximum contribution to the null 
eigenvector. Hence, we remove channel k from the environmental channel set. Then, we 
build a new (N — l) 2 correlation matrix D from the remaining N-l channels and follow the 
same procedure described above until the eigenvalues are far away from zero. 
Let us summarize our method in steps: 

1. Threshold the correlation matrix Cj^ using equation (|26| ) to get a new correlation 
matrix D®. 

2. Calculate the eigenvalues A and the eigenvectors A of the matrix D^. 

3. Check whether there is an eigenvalue near zero. 
If not, calculate the coupling constants using equation Q27D and remove the correlations 



from the channel of interest (X) using equation (|14]). 

If there is an eigenvalue A which is close to zero, find the maximum (for example 
|A°| ) of the absolute values of elements in the corresponding eigenvector A . Then, 
eliminate the corresponding channel (for example channel k if |A°| is the maximum) 
from the channel set. That means that we eliminate the fc's row and fc'th column in 
D®. Then return to step 2. 



11 General discussion of thresholding methods 

An ideal scheme of removing correlations from the channel of interest X to obtain x should 
have the following properties: 
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1. If any environmental channel is rescaled, i.e. Yj =>- oYj, it does not affect the result 
x. 

2. If any environmental channel is duplicated, i.e. Yn+i = Yj, N =>- N + 1, it does not 
affect the result x. 

3. If any environmental channel is duplicated by a linear combination of other channels, 
i.e. Yn+i = «2Y 2 + . . . + ojatYn, N =>- N + 1, then Yn+i can be removed from the 
set of channels without affecting the result x . Of course, property 2 is just a special 
case of property 3. 

4. If the environmental channels are replaced by any linear combination of the original 
channels, i.e. Y/ = MjjYj, where M is an invertible matrix, it does not affect the 
result x. 

5. If the environmental channels are re-labeled, it does not affect the result x. This is a 
special case of condition 4, when My is a permutation matrix of the set (2, . . . , N). 

6. If an environmental channel is Gaussian noise and independent of other channels, then 
it does not affect the final result x at a statistically-significant level. 

If we do not do thresholding (when the number of frequency bins F in a frequency band 
is very large), our method has all six properties above. However, if we threshold using the 
method described in Section [H^ (when the number of frequency bins F in a frequency band 
is not large enough), our method has all the properties above except for property 4. 

We also considered two other thresholding methods. The first one is to threshold on 
individual channels. We check the absolute value of the covariance coefficient between 
channel j and channel 1 (which is the channel of interest X) in frequency band b, pf}. If p^y 
is smaller than some threshold value p* then we eliminate channel j from our channel set. 
If pji is greater than the threshold value p*, then we keep that channel in our channel set. 
This method has all the properties above except for property 4. Compared with the method 
discussed in Section [10], this method is too conservative: it does not remove all the possible 
contaminating noise. It is possible that one environmental channel Yj is not correlated 
with the channel of interest X but is correlated with another environmental channel Yk. 
Suppose channel Yk is correlated to the channel of interest X, and contributes to the removal 
of correlated noise from the channel of interest X by equation (|T4]). In this situation, if we 
include channel Yj in the channel set, it is equivalent to the following two operations. First, 
we remove the correlation between channel Yj and channel Yk from channel Y^. We call the 
result Yk. Then we remove the correlation between channel X and channel Yk from channel 
X. This is better than only removing the correlation between channel Yk and channel X 
from channel X because our estimation of the correlation between Yk and X is better than 
our estimate of the correlation between Yk and X. 

Another thresholding method is to consider the eigenvectors of the correlation matrix 
between the environmental channels. The correlation matrix is diagonalized by a similarity 
transformation, which is a unitary matrix U made up of the eigenvectors of the correlation 
matrix. 

L = U^CU (29) 
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Here, the matrix Ly is a diagonal matrix of the eigenvalues of the correlation matrix. 

( Xi if % = j, or 
La = (30) 
I otherwise . 

Construct new channels Y/ by Y/ = UijYy These channels Y/ are independent of each 
other since they have vanishing correlation. Then threshold on each channel Yi' individually 
using the method of thresholding in the two-channel case described in Section |9]. However, 
there is a problem with this apparently promising method. If any environmental channel is 
rescaled, i.e. Yj =^ aYj, the eigenvector of the correlation matrix is changed. Hence the 
independent channels that we build Y/ = t/^Yj are also changed. So this method does not 
have properties 4 and 5. One may argue that we can normalize the environmental channels 
first and then diagonalize the correlation matrix by the unitary matrix U . In this way U is 
unique. However we can not find any physical reason to use a unitary matrix to diagonalize 
the correlation matrix. If we use non-unitary matrix, it is no longer unique. To demonstrate 
that the non-unitary matrix is non-unique, construct a matrix M 

fl/\Ai ifi=j, or 
Mij = (31) 

l> otherwise . 

It is obvious that M ' = M, and / = (UMyCUM is the diagonal identity matrix. We can 
arbitrarily choose another unitary matrix U' . 

I = U\ ] {UM)^CUM)U' = (UMU'yC(UMU'). (32) 

Let matrix P = UMU' . Equation (|32| ) shows that P diagonalizes the correlation matrix C 
to a unit matrix /. Because M is non-unitary, P is non-unitary. Because of the choice of 
U' is arbitrary, P is not unique. Even when only a unitary matrix is used, there is still a 
problem. If an environmental channel is duplicated, i.e. Yn+i = Yj, N N + 1, the n 
eigenvectors of the correlation matrix are changed. This means this that method does not 
have properties 2 and 3. 

It seems difficult to find a method of thresholding which has all six desired properties. 
There is a tradeoff in choosing a suitable method. In practice, when full-scale LIGO be- 
gins operation, we expect that the methods discussed here will provide some guidance in 
choosing a suitable set of environmental signals to use in "clean up" and understanding the 
interferometer's output. We anticipate that with some experience and experimentation, it 
will not prove too difficult to identify a set of suitable channels in different frequency bands, 
and thresholds can be set based on experience and on understanding of the instrument. 



12 Conclusion 

The methods described in this paper amount 
interest is correlated with other environmental 
quantity being measured in the signal channel 



to estimating whether or not a signal of 
channels. The key assumption is that the 
should not have any correlations with the 
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environment. The correlations are removed following a prescription that minimizes the 
power in user-defined frequency bands. 

We assume that the correlations with the environment are described by linear transfer 
functions. The methods used to identify and remove these correlations are very similar to 
Principal Component Analysis (PCA) carried out in frequency space. We have used a real 
data set to demonstrate that the method is both reasonable and effective. 

When the full scale LIGO interferometers begin operation in the year 2000, there will be 
over a thousand environmental and control channels being monitored, and the problem of 
identifying and eliminating the most significant environmental contamination will be severe. 
In the end, we suspect that the methods described here will be useful in two ways. First, 
they will assist in identifying which environmental channels are having the greatest effects 
on instrument performance. The frequency dependence of these effects might be helpful 
in trying to determine how they can be alleviated or eliminated. Second, after the most 
relevant set of environmental channels have been successfully identified, these techniques 
should make it possible to "clean up" the signal, although further study will be needed to 
determine if this has undesirable side effects. 
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A probability distribution of p 2 for uncorrelated Gaus- 
sian noise 

From equation (25) in Section |9|, we know that when the two channels are independent Gaus- 



sian random variables, the method described in Section [| will falsely remove "correlations" 
which do not exist. One method to avoid this false dismissal of "correlation" is to threshold 
on the coherence p 2 defined by equation fll2]) . To set a reasonable threshold on p 2 , we need 
to know the probability distribution of p 2 for the case where X and Y are not correlated. 

To determine the probability distribution of p 2 , we first consider an F-dimensional com- 
plex Gaussian random variable Z(J) = R(j) + G 1...F, where R(j) and are 
independent real Gaussian random variables with vanishing mean and unit variance. Note 
that in order to make the notation simpler, we introduce a new symbol Z to represent 
the X or Y in previous Sections. The probability distributions are (subscript "g" means 
"Gaussian" ) 

p g (R(j)) = -Le-W/ 2 , and 
v Zn 



pMj)) = 4rt- I{j)2/2 - (33) 

V Z7T 
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Define Uz(j) = \Z(j)\ 2 = R(j) 2 + I{j) 2 - The probability distribution p u {Uz{j)) is defined 
by 



W(U)p u (U)dU = 

/ W (R 2 + I 2 )p g {R)p g {I)dRdI (34) 

-oo J — oo ^ ' 

for any choice of function W. Taking W(x) = 5(Uz{j) — x) yields 

r l e -Uz(j)/2 for > 0> or 

Pu(U z (j)) = (35) 
I for UzU) < • 

F 

Define Uz = S Uz{j)- In the F-dimensional real space spanned by (t/^(l), . . . , Uz(N)) the 
l 

joint probability distribution p(C7^(l), . . . , Uz(F)) is 



(36) 



p(t^(l), . . . , ^(F)) = p u (^(l)) . . . Pu(U z (F)) 
'iy e -u z /2 if > o for j = 1 ... F, 

otherwise. 

Now we calculate the probability distribution of p 2 assuming that X and Y are independent 
F-dimensional complex Gaussian random vectors. According to equation ([12]) , the coherence 
p 2 is defined by 



2 _ \(Zi,Z 2ji 

p ~ (z 1 ,z 1 )(z 2 ,z 2 y [6i) 

where Z\ and Z 2 are F-dimensional complex vectors. Without loss of generality, we assume 
Z\ and Z 2 both have unit norm, or (Z 1 , Z^ = (Z 2 , Z 2 ) = 1, so Uz 1 = Uz 2 = 1 • Because 
equation (|3"7| ) is rotationally invariant, we can also assume -Zi(l) = 1 and Z\{j) = for 
j = 2, . . . F. Then, 

p 2 = \Z 2 (l)\ 2 = U Z2 (l). (38) 

Thus, the probability distribution of p 2 is equal to the probability distribution of f7z(l) given 
that Uz = 1, where Z(j) is an F-dimensional complex random variable with probability 
distribution given by equation (|33D . 



,2 



p(p 2 )=p(Uz(l)\U z = l)\u z (i)= P i, (39) 
and the cumulative probability distribution 

V{P 2 > p 2 )=p(Uz(l) > p\U z = l)l =pl . (40) 
It will be easier to first determine the cumulative probability distribution p(p 2 > pg), or 

p(U z (l) > p\U z = 1) = I l p(U z (l)\U z = l)dU z (l). (41) 
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Note that to normalize this probability distribution requires p(UziX) > nWz = l)U=o = 1- 
In the P-dimensional real space spanned by (Uz(l), ■ ■ ■ , Uz(F)), the condition U z = 1 
defines part of an (F — l)-dimensional plane Pq defined by 



P : 



U z {l) + ... + U z {F) = l 
U z (j)>0 forjGl...P. 



(42) 



The region (Z7z(l) > A* an d Uz = 1) defines a part of an (P — 1) -dimensional plane P M given 
by 



P,:i 



U z (l) + ... + U z {F) = l 
U Z {1) > n 

U z (j)>0 forj'G2...F. 



(43) 



Note that P M is a subset of Po- From equation (|36|), we can see that p(Uz(l), • • • , Uz(F)) is 
a constant for any given Uz- In our case, Uz = 1- Hence > /i|£/z = 1) is just the 

ratio between the (F — l)-volume of P M and the (P — l)-volume of Po. To help calculate the 
volume of P M , we can translate the coordinates (Uz(l), ■ ■ ■ , Uz(F)) so that the origin moves 
to the point (/x, 0, . . . , 0): 



new coordinates: < 



u z (iy = u z (i) - fi 

U z (j)' = U z {j) for j G 2 ... P. 



In the P-dimensional real space spanned by the coordinates ([/^(l)', 
P M is given by 



(44) 

U Z {F)'), the plane 
(45) 

Comparing equation ([42]) and equation (|45"D, we see that P M and Po are rescaled versions of 
each other, and the linear dimension of P M equals (1 — fi) times the linear dimension of Po. 
Hence, 



P, 



u z (iy + ... + u z (Fy = i-fi 

U z (j)'>0 for j G 1 ... P. 



p{Uz{l)>n\U z = l] 



(P — l)-volume of P M 
(P — 1)- volume of P 



Thus the cumulative distribution 



p{ p 2 > pi) = (l - P l 



F-l 



(46) 



(47) 



Note that p(p 2 > pl) 



p 2 o=o 



1. Hence this probability distribution is correctly normalized. 
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Taking the derivative of equation (f47|) to get the differential probability distribution 
function yields 



p(p2) = _Mp>>pI) 



dpi 

F-2 



= (F-I)(l-P 2 ) (48) 
p(p 2 ) = (F-l)(l-p 2 ) F ~ 2 . (49) 



Hence, 



The expected value of p 2 is 

< P 2 >= J pWW = y (50) 

Now we can return to the problem identified at the beginning of this section. 

The aim was to avoid false dismissal of non-existing "correlation" by setting a reasonable 
threshold on the coherence p 2 (defined by equation (|T2])) between channels X and Y. If p 2 is 
greater than the threshold p* 2 , we conclude that the correlation between X and Y is present 
at a statistically-significant level, and remove the correlation using method described in 
Section f|. If not, we leave channel X unchanged. According to equation (|47|), when two 
channels are just independent Gaussian random variables^ the probability of incorrectly 
removing "correlation" between them is given by (1 — (p*) 2 ) • For example, when F = 128 
and (p*) 2 = 10/ F, the probability of incorrectly removing "correlation" is ~ 3 x 10~ 5 . 
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